Tag
1 article
Meta AI has open-sourced GCM, a GPU cluster monitoring tool designed to improve performance and hardware reliability in AI training environments.